experience memory
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > New York > Richmond County > New York City (0.04)
- (6 more...)
Large Language Models Are Semi-Parametric Reinforcement Learning Agents
As declared by Seifert et al. [1997], the episodic memory of the experiences from past episodes plays a crucial role in the complex decision-making processes of human [Suddendorf and Corballis, 2007]. By recollecting the experiences from past episodes, the human can learn from success to repeat it and learn from failure to avoid it.
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Asia > China > Hong Kong (0.04)
Should I Stay or Should I Go: A Learning Approach for Drone-based Sensing Applications
Polychronis, Giorgos, Koutsoubelias, Manos, Lalis, Spyros
Multicopter drones are becoming a key platform in several application domains, enabling precise on-the-spot sensing and/or actuation. We focus on the case where the drone must process the sensor data in order to decide, depending on the outcome, whether it needs to perform some additional action, e.g., more accurate sensing or some form of actuation. On the one hand, waiting for the computation to complete may waste time, if it turns out that no further action is needed. On the other hand, if the drone starts moving toward the next point of interest before the computation ends, it may need to return back to the previous point, if some action needs to be taken. In this paper, we propose a learning approach that enables the drone to take informed decisions about whether to wait for the result of the computation (or not), based on past experience gathered from previous missions. Through an extensive evaluation, we show that the proposed approach, when properly configured, outperforms several static policies, up to 25.8%, over a wide variety of different scenarios where the probability of some action being required at a given point of interest remains stable as well as for scenarios where this probability varies in time.
- Europe > Greece (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Information Technology (0.68)
- Transportation (0.47)
Large Language Models Are Semi-Parametric Reinforcement Learning Agents
Zhang, Danyang, Chen, Lu, Zhang, Situo, Xu, Hongshen, Zhao, Zihan, Yu, Kai
Inspired by the insights in cognitive science with respect to human memory and reasoning mechanism, a novel evolvable LLM-based (Large Language Model) agent framework is proposed as REMEMBERER. By equipping the LLM with a long-term experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory. We further introduce Reinforcement Learning with Experience Memory (RLEM) to update the memory. Thus, the whole system can learn from the experiences of both success and failure, and evolve its capability without fine-tuning the parameters of the LLM. In this way, the proposed REMEMBERER constitutes a semi-parametric RL agent. Extensive experiments are conducted on two RL task sets to evaluate the proposed framework. The average results with different initialization and training sets exceed the prior SOTA by 4% and 2% for the success rate on two task sets and demonstrate the superiority and robustness of REMEMBERER.
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > New York > Richmond County > New York City (0.04)
- (7 more...)
Federated Reinforcement Distillation with Proxy Experience Memory
Cha, Han, Park, Jihong, Kim, Hyesung, Kim, Seong-Lyun, Bennis, Mehdi
In distributed reinforcement learning, it is common to exchange the experience memory of each agent and thereby collectively train their local models. The experience memory, however, contains all the preceding state observations and their corresponding policies of the host agent, which may violate the privacy of the agent. To avoid this problem, in this work, we propose a privacy-preserving distributed reinforcement learning (RL) framework, termed federated reinforcement distillation (FRD). The key idea is to exchange a proxy experience memory comprising a pre-arranged set of states and time-averaged policies, thereby preserving the privacy of actual experiences. Based on an advantage actor-critic RL architecture, we numerically evaluate the effectiveness of FRD and investigate how the performance of FRD is affected by the proxy memory structure and different memory exchanging rules.
- Europe > Finland > Northern Ostrobothnia > Oulu (0.05)
- North America > United States > New York (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Rafati, Jacob, Noelle, David C.
Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences of the agent. When combined with an intrinsic motivation learning mechanism, this method learns subgoals and skills together, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the ATARI 2600 game called Montezuma's Revenge.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Merced County > Merced (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Neurology (0.93)
- Leisure & Entertainment > Games (0.88)